Goto

Collaborating Authors

 parallel and efficient hierarchical


Parallel and Efficient Hierarchical k-Median Clustering

Neural Information Processing Systems

As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical $k$-center, $k$-means, and $k$-median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efficient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution. Furthermore, we also get that for all $k$ simultanuously the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta \log (\Delta d n))$ factor bigger that the corresponding optimal solution.


Parallel and Efficient Hierarchical k-Median Clustering

Neural Information Processing Systems

As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical k -center, k -means, and k -median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efficient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical k -median problem that, when using machines with memory s (for s\in \Omega(\log 2 (n \Delta d))), outputs a hierarchical clustering such that for every fixed value of k the cost of the solution is at most an O(\min\{d, \log n\} \log \Delta) factor larger in expectation than that of an optimal solution. Furthermore, we also get that for all k simultanuously the cost of the solution is at most an O(\min\{d, \log n\} \log \Delta \log (\Delta d n)) factor bigger that the corresponding optimal solution.